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Abstract 

NASA space exploration should largely address a 
problem class in reliability and risk management 
stemming primarily from human error , system risk and 
trade-off analysis , by conducting 
research into system complexity ; risk characterization 
and modeling , and system reasoning . In general in 
every mission we can distinguish risk in three possible 
ways : a) known-known , b) known-unknown , and c) 
unknown-unknown. It is probable almost certain that 
space exploration will partially experience similar 
known or unknown risks embedded in the Apollo 
missions , Shuttle or Station unless something alters 
how NASA will perceive and manage safely and 
reliability. 


1. Introduction 

Current and future NASA Exploration goals include 
missions with the most difficult, dangerous, and 
dynamic operations in history, ranging from Earth orbit 
operations to planetary and universe exploration. 
These missie^v^3^fiS-h nature, push the limits of 
human, technological, and theoretical knowledge 
boundaries. 

NASA continues to push technology limits with its 
missions, and exploration is not an exception. 
Increased system functionality often results in 
increased implementation complexity, making both 
success and affordability much harder to achieve. 
Moreover, it is accepted that not all human risks can be 
mitigated during design: Human reliability in systems 
cannot be verified with full coverage and components 


wilf fail or degrade, operators will make mistakes, and 
operating environments are uncertain. In addition, the 
state of the system and its environment may 
dynamically increase control complexity or decrease 
reaction times such that traditional control means are 
inadequate. De^lopmgp^p| cr|tjcal technologies that 
provide system resiliei^y wilj- enable future systems to 
adapt and recover from these unanticipated problems. 
Significant improvements in critical technologies will 
be required to reduce risk in future NASA missions. 

This paper will underline the scope and importance 
of human reliability in NASA space exploration for 
now and for the future. 

2. Successes and Failures 

NASA's record of mission success is strong. 
However , the ever-increasing complexity of NASA 
missions will significantly challenge NASA *s ability to 
assure mission success. 

With all of its success, NASA has had failures that 
have cost billions of dollars^ have lost opportunity for 
scientific advancement, and '-^g^.ally have 

resulted in the loss of human life.. Notable failures 
include the first manned Apollo flight in 1967 which 
resulted in four fatalities, the Space Shuttle Columbia 
in 2003 which resulted in seven fatalities, the Space 
Shuttle Challenger launch in 1986 which resulted in 
seven fatalities, and the Mars Climate Orbiter and 
Polar Lander missions in 1999 which cost more than 
$1.5B. In the period of 1986 to 2001, the top ten 
NASA failures cost around $9.6B, with half of that 
cost due to the loss of the Space Shuttle Challenger. 
NASA is not alone in experiencing such failures during 



that time period, with estimates of total U.S. space 
mission failures costing SI8.6B and worldwide space 
mission failures costing $3 LIB. Rates of failure for 
U.S, launch vehicles (NASA, DoD and commercial) 
have been estimated to be 7.6% for the period of 1985 
to 1999. The costs of failure are high, and the rates of 
failure are not appreciably improving. 

The need for improving risk management is 
recommended as the highest priority by many NASA 
internal and independent studies and commissions. 
The Faster Better Cheaper Final Task report 
recommends that missions develop and maintain 
"Programmatic and Mission Risk Signatures,” making 
risks and risk countermeasures visible to all inside and 
outside the project (FBC p.4). The NIAT report 
(Enhancing Mission Success) recommends that NASA 
“Improve and enhance NASA' and contractor 
knowledge and ability to identify, assess, mitigate, and 
track risk through the definition of success criteria, 
acceptable risk, utilization of existing and new tools, 
and proper policy and guidance" (MAT-7, p ,42). The 
NIAT cites 31 separate recommendations supporting 
enhanced risk management ^m.png ..the^recept np.shap 
reports. . ‘ 

NASA has taken major steps toward managing its 
risks. Presently, recent major mishaps considered by 
the GAJDB, NIAT and others (e.g., Mars Climate 
Orbiter, Lewis Spacecraft, Mars Polar Lander, Wide- 
Field Infrared Explorer, and the V-22 rotorcraft), and 
various strategic agency analyses (e.g. Shuttle 
Independent Assessment Team, the Faster, Better, 
Cheaper Task Report, and the U.S.A.F. Broad Area 
Review of 1999) have identified the critical need for 
NASA and the U.S. aerospace industry to significantly 
retool its engineering processes and capabilities. 

3. Space exploration and human error 

In every mission we con distinguish risk in three 
possible ways: a) known-known — we know the risk 
and have retired it, b) known-unknown ---Me know that 
there is a risk and the risk is modeled and c) unknown- 
unknown we don't even know there is' a risk 
Exploration is about diving in the unknown-unknown. 

3.1 Risk mitigation for the probably future 

Mishap analyses continue to highlight humans as 
contributing factors to mishaps. For example, poor 
knowledge management contributed to the Ariane 5 
and Space Shuttle SSME repair mishaps. Hubble and 
other case studies such as Challenger and Columbia 


also point to management and cultural issues as key 
factors in mishaps , Some argue that humans are always 
involved in mishaps and assert that the causes of 
mishaps are frequently, if not almost always, rooted in 
the organization— its culture, management, and 
structure. However, it is insufficient to focus 
exclusively on social and organizational factors; how 
these relate systematically to technology development, 
deployment, and use is also important. 

Analyses of recent major mishaps (e.g.. Mars 
Climate Orbiter, Mars Polar Lander, ancLWide-Field 
Infrared Explorer), various strategic agency analyses 
(e.g., NASA Integrated Action Team, Shuttle 
Independent Assessment Team), and various case 
studies have consistently identified knowledge 
management and humans as contributing causes to 
mishaps. 

An Aerospace study examined nearly 4000 launches 
from 1957. Based on the analysis results,, this paper 
recommends enhancements for launch vehicles, 
including avionics redundancy, software and integrated 
system testing. Human-in-the-loop processes, spft^are .. 
function and propulsion and flight control subsystem 
failures ranked high as initiators- of mishaps. The 
mishap data presented in a Boeing report indicates that 
loss of control in flight is the leading cause of fatal 
accidents and controlled flight into the terrain is the 
second leading cause in commercial airline accidents. 
Contributing factors in these accidents included 
software errors, component failures, improper human- 
machine interactions (poor training), operator error on- 
board or on the ground (sometimes due to an 
uninformed operator) and unanticipated operating 
environments. In fact, flight crews expressed the desire 
for real-time, onboard integrated diagnostics that 
provide “answers not just clues” to the. causes of 
anomalous conditions occurring during flight. 
Nonintegrated caution- warnings are not sufficient 
because the crew is responsible for cognitive 
integration that takes precious minutes and could mean 
.The difference between life and death.. 

The identification of technology is a major part of a 
solution set that will have significant impact on 
mitigating the risk in .future NASA missions. In 
general, failures constantly suffer poor requirements 
specification and system verification, and rigid 
operations and control systems have been inferred as 
major causes of mission mishaps. Current 
technologies are not optimal for carrying out 
effective risk mitigation strategies as they lack 
significant capability to assess system condition or 
to validate system performance. System robustness, 



redundancy and capability for rapid recovery are 
currently inadequate. 

It is probable almost certain that exploration will 
partially experience similar known or unknown risks 
embedded in the Apollo missions, Shuttle or Station 
unless something alters how the enterprise will 
perceive and manage safety and reliability. 



Figure 1 : In every mission we can distinguish risk in three 
possible ways: a) known-known — we know the risk and 
have retired it, b) known-unknown — we know that there is a 
risk and the risk is modeled and e) unknown-unknown - we 
don’t even know there is a risk. Exploration is about diving 
in the unknown-unknown. 
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In order to establish technology development 
priorities for risk mitigation in space exploration future 
missions, the focus should be on three main goals (type 
of risks) including 1) determination of the major 
problem classes involved in NASA missions and such 
mastering the known risks in mission design 2) 
determination of precursors of risk (latencies) and 3) 
identification of processes to capture and model new 
risk. Figure 1 illustrates the lifecycle of risk. 
Exploration (unknown-unknown) brings new 
experience where new risk is modeled and understood 
(known-unknown). Capabilities can then transform 
and retire the risk (known-known). 

NASA space exploration should largely address a 
problem class in reliability and risk management 
stemming primarily from human error, system risk and 
..muM^^ective trade-off analysis, by conducting 
""reseat® Into system complexity, risk characterization 
and modeling, and system reasoning. The state of the 
art and foundation in this category appears to be well 
established, but the resulting mishaps and catastrophes 
demonstrate and tell us otherwise. There are clearly 
things missing or not addressed in today’s state of the 
art approach, either in technology or organizations. 
Development activity will have to support risk 
analysis, design robustness* failure modeling, and 
system trade-offs throughout the entire lifecycle of the 
enterprise, with particular emphasis on early-phase 
capabilities. Today we - can only imagine the state of 


the art of safety and mission assurance. Although we 
do utilize modem tools and techniques to analyze our 
systems and make risk decisions, these present 
methods have significant limitations. 

3.2 The need to manage risk for sustainable 
exploration 

Various studies have also highlighted the need to 
continuously manage risk throughout the life cycle. 
Studies also uncovered the fact that risk analysis 
should not only be conducted linearly but recursively 
as well. This means that changes in the later phases of 
a program require revalidation of earlier life cycle 
phases, such as requirements, design, etc., in order to 
fully quantify potential risk exposure. 

NASA has to invest in a variety of defenses that 
constitute reliability for exploration. Such defenses 
include physical barriers, personal protective 
equipment, procedure design, training, and the like. A 
variety of partially redundant defenses typically 
intervene between a hazard and the losses that would 
_.apsuifyig-^rat hazard occurred. It is only when all the 
defenses are'* unfortunately aligned and therefore 
penetrated that a mishap occurs . 

There are four latent conditions critical to the space 
exploration: the distributed nature of work, the 

heterogeneous nature of work, data overload, and the 
presence of advanced automation and decision support 
information technologies. These four elements in part 
define the complexity of work that gives rise to a 
variety of risk situations. 

In general, when work is distributed across space 
and time among multiple people, certain latent 
conditions necessarily exist that may lead to future 
mishaps. These include information sharing, 
coordination, communication, procedures, training, and 
knowledge capture and reuse. Information sharing 
may be absent, incomplete, incorrect, or not done in a 
timely' manhej;^(Soord^tion activities may be 
disorganized, untimely, missing, or unnecessarily 
difficult for a particular organizational structure. Poor 
communication practices, inappropriate initial framing 
of the interaction, poor training, and poor procedure 
design may lead to poor information sharing and 
coordination, which may directly lead to mishaps, or 
indirectly create 'deeper 5 latent conditions of mistrust 
or inappropriate group norms among members of the 
organization. Distributed work also requires 
distributed knowledge; therefore, poor knowledge 
capture and lack of reuse are issues as well. 



A second latent condition is the heterogeneity of 
work. The complexity of NAS A missions demands the 
integration of heterogeneous skills and knowledge 
from a diverse workforce. For example, power 
systems engineering, structural engineering, orbital 
mechanics, astrophysics, computer architecture, data 
handling, and flight dynamics are just some of the 
technical disciplines involved in building and operating 
spacecraft (to say nothing of the difference between 
engineering and management). These disciplines 
constitute “micro-cultures” with their own norms, 
jargon, and styles of interaction that may be 
incompatible and lead to misunderstandings and 
failure. 

A third latent condition is the proliferation of data 
and information. The greater quantity and 
specialization of work, as well as the greater quantity 
and sophistication of information technologies 
described below, leads to a serious data overload 
problem for human practitioners. Information 
technologies themselves are often developed with the 
a^gggLgf ^educing data overload by providing tools for 
information fusion, decision support and automation. 
While often successful at one level, at another level 
such technologies add to the complexity of work. 

Automation and decision support information 
technologies are another necessary facet of NASA 
missions that is the basis for a number of latent 
conditions. Such information technologies play into 
this analysis in a number of ways. Automated systems 
perform many functions in the operational 
environment. Engineering design and analysis relies 
on any number of software packages. Training 
systems are often implemented as interactive 
computer-based exercises. Simulations are used for 
engineering design and training and can and should be 
used as an aid to develop requisite imagination. Of 
particular interest is the use of model-based 
simulations to support vehicle or system design. A 
key problepiv thaf^pes NASA is the proliferation of 
models and tools for high-fidelity but isolated 
component models and the associated lack of an 
integrating framework so that models can be used 
together effectively. Questions of what, model fidelity 
is good enough for different kinds of decision making 
and how integrated models can support more effective 
team and organizational problem solving are critical 
issues. 


3.3 Managing the known-unknown or risk 
mitigation 

The intersection of distributed work, heterogeneous 
work, data overload, and technology is useful to 
consider explicitly in the context of "poor knowledge 
management." A study on why knowledge 
management is difficult found four major factors: (1) 
Ignorance (people don't know that what they know is 
useful or know that somebody else knows something 
useful), (2) No "absorptive capacity" (people don’t 
have the .time, money and management resources to 
explore and reuse others' knowledge), (3) Lack of 
preexisting relationships (knowledge flows best 
between people who know, respect, and like each 
other), and (4) Lack of motivation, (people don’t 
perceive value added). A fifth commonly cited factor 
is also clumsy knowledge management technology; 
yet, as many authors have pointed out, the hard 
problems are really always with "the people." 

Space exploration should address the development and 
utilization of a variety of topis, processes, and capabilities, 
largely around^r4 .... teainework of information 

technology. Means ‘ for better modeling and 
characterization of risk (both human and historical - 
design/ops phase) and its relationship to complexity and 
known or potential anomalies, visualization 
(understanding) of risk and risk profiles, integration 
methods, particularly those for integrating highly disparate 
models, and tools enabling the utilization of risk models in 
active design trades are examples of capabilities 
instantiated in IT. 

Space exploration investments in new technologies 
to support full lifecycle, "integrated risk management is 
critical for successful missions. This will support the 
development of integrated risk management tools and 
risk profiling capabilities instantiated in multiple 
categories: • 

Category 1 - Development of tools for identifying, 
assessing and trading 'fish 1 ' before and during 
formulation; improvements in risk management from ^ 
the outset will yield benefits throughout the mission 
lifecycle. 

Category 2 - Development of safety and risk- 
related systems analysis tools , combines two thrusts, 
addressing a) how risk profiles can be maintained and 
utilized throughout the full lifecycle, and b) how 
system evolution affects designs. 



Category 3 - Development of methods and tools 
that constitute a human learning ‘feedback’ loop. 
Their goal is to improve our understanding of the 
factors that contribute to aerospace accidents and to 
develop ways to use that experience to improve 
designs. 

Space exploration should focus on short-term and long- 
term objectives. One key short-term objective is to 
realize the risk characterization of a mission/system 
model with full breadth and with depth significant 
enough to demonstrate the utility of a true risk-based 
design paradigm - that is, to characterize the system 
risk sufficiently and early enough to be fully traded in 
the design phase, along with and against other typical 
system attributes, and to fully define the benefits 
thereof. 

The long-term goal is to support the development 
and deployment of an integrated full-lifecycle 
capability that is infused into all infrastructures 
supporting the enterprise missions. This capability 
would improve our understanding of risk and risk 
precursors for all , mission/system types and the 
relationship between risk and complexity, provide Tori' ' 
• full and complete modeling of risk at the system and 
subsystem levels at all phases, and significantly 
improve our ability to reliably design, build, and 
operate Agency missions. In addition, the ability to 
hilly understand risks and the effectiveness of risk 
mitigation (such as testing), can also help to optimize 
the retirement of risk and, indirectly, to reduce the cost 
and development time of missions . 

4. A unique opportunity 

New exploration missions have now a renewed 
opportunity to address risk with novel capabilities in 
early phase design when compared to Shuttle and 
Station; key developments supporting other phases, 
particularly operations, will be fostered due to their 
importance to key customers and because of the 
jixyafe^ble insight that it provides regarding management 
of the transition between phases and continuing model 
maturity. As the new exploration projects progresses, 
the emphasis will move toward later parts of the 
lifecycle and the transition between phases that are 
necessary to maintain model integrity; the goal will be 
to eventually demonstrate a capability applicable to 
full-lifecycle design and operations. 

1 . Model Based Risk Management involves 
methods and tools to model accurate cause 
and precursor identification, communication, 


and learning, including taxonomies and 
frameworks to enable comprehensive, 
comparative analyses and trending in mishap 
and anomaly reporting data for exploration 
missions, and information organization, 
analysis and visualization tools to facilitate 
and manage distributed processes. 

2. Risk Assessment involves the principal 

research and component-level developments 
supporting new methods of risk 
characterization . - and visualization, risk 
modeling, probabilistic assessment 
capabilities, and development of relevant 
historical and causal data supporting risk 
assessment and management. In addition, the 
scope will extend farther into the mission 
lifecycle in order to ensure compatibility with 
subsequent phases and avoid the phase- 
transition problems that have plagued other 
programs. 
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